Overview: Script for environmetnal data from water stations. Sources and information on data below. I am looking at salinity, pH, dissolved oxygen, water temperature, and air temperature where avaiilable. To remove outliers I first remove data points that are flagged as failed in the QC test. Then I use a rolling median to remove data points +/- 2 standard deviations from a 2 hour rolling median. Finally I calculate hourly median since data was collected at different frequencies. This will set me up to do analysis between sites and with field data. Sites graphed from cloest to the delta to the mouth of the bay.

Removing points “by hand”: after trying various methods (filtering methods, different rolling median windows, removing “suspect” data that removed too many real points, seeing if the points were removed with the hourly median), certain data points that were clearly not right were removed by visial assesment. (For example, Fort Point as a data point ~0ppt salinity in the summer when there are no other data points even close to that salinity for months so it was removed.). For now I’m doign this after I calculate the hourly median but maybe it should be done before the hourly median was caluclated to make sure those values are accurate? Instead of deleting that row of data, I replaced it with NA so that it still marks a data collection and it doesn’t mess up the hourly median calculation. NEED TO DO THIS, as is the code just removes them

Notes on data:
-Accessed and downloaded all data on September 30, 2020, except for EOS, downloaded 1/21/2021 with new link that has QC tests
-Downloaded all data available from 1/1/2017-12/31/2019
-China camp http://cdmo.baruch.sc.edu/aqs/zips.cfm Data logged every 15 minutes. Requested citation format: NOAA National Estuarine Research Reserve System (NERRS). System-wide Monitoring Program. Data accessed from the NOAA NERRS Centralized Data Management Office website: http://cdmo.baruch.sc.edu/; accessed 30 September 2020.
-Richardson Bay/NERR data. Data logged every 15minutes.
-EOS http://erddap.cencoos.org/erddap/tabledap/tiburon-water-tibc1.html Data logged every 6 minutes. time in UTC. ph and salinity available. No air temperature. (old link downloaded 9/30/2020, not using https://oceanview.pfeg.noaa.gov/erddap/tabledap/rtcctdRTCysi.html)
-Point Fort http://erddap.cencoos.org/erddap/tabledap/fort-point.html?time%2Csea_water_temperature%2Csea_water_temperature_qc_agg%2Cz&time%3E%3D2020-09-20T21%3A07%3A41Z&time%3C%3D2020-09-29T12%3A00%3A00Z. time in UTC. Data logged every 6 minutes.

Raw data stored here: https://github.com/Cmwegener/thesis/tree/master/data/environmental/raw
Hourly meadians:https://github.com/Cmwegener/thesis/tree/master/data/environmental/hourly_median

Set up

rm(list=ls())

library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes) 
library(cranlogs)
library(knitr)
library(openair)
library(lubridate)

Make a custom function to calculate median, standard deviation (sd), 95% confidence interval where x is a numeric vector. Here “hi” and “lo” are defined as 2 strandard deviations away from median. This creates a functin so you don’t need to change anything here/this step does not manipulate data. You can change the equation to define the hi and low as 3 or however many sds away you would like. You can also change median to another metric such as mean. na.rm=TRUE removes any “NA” values. “ret” are the new columns it’ll add when you run this on a df.

custom_stat_fun_2<-function(x, na.rm=TRUE){
  m<-median(x, na.rm=TRUE)
  s<- sd(x, na.rm=TRUE)
  hi<-m+2*s
  lo<-m-2*s
  ret<-c(median=m, stdev=s, hi.95=hi, lo.95=lo)
}

####Salinity#### I’ll go through each of the steps in more detail for the first variable I’m looking at, salinity. For the other ones my comments will decrease but that is because I’m doing to the same approach for each one.

####China Camp####
Read in data. Subset (not needed but it’s a large df). Need a column named “date” with no time stamp for step later.

cc <-
  read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/china_camp.csv",
    header = TRUE
  )

cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))

cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))

cc<-subset(cc, select=c(date, datetime, Sal, F_Sal))

cc%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: All data points",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).

Remove data that failed QC test

cc<-cc[- grep("-3", cc$F_Sal),]

cc%>%
  ggplot(aes(x=date, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: Failed QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Roll apply using custom stat function we created at the beginning. This needs a column named “date” with no time stamp in order to work. Select = what paramter/column you want to apply the rolling average to, in this case we’re interested in salinity. FUN= is the custom function created at the beginning. Width of window depends on frequencey of data collection. For example, data was collected every 15 min so width = 672 is 7 days, 1344 is 14 days, 2880 is 30 days. Here I am doing a 4 hour window so that is a width of 16 (4 observations per hour for 4 hours 4*4=16).

cc<-cc%>%
  tq_mutate(
    select= Sal,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter to remove values that are +/- 2 standard deviations from the rolling median

cc<-filter(cc, Sal <hi.95 & Sal >lo.95)

cc%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from China Camp: Failed QC points + +/- 2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 9 rows containing missing values (geom_point).

Hourly median
For timeAverage function to work, you need a date or POSIXct column named “date” but it needs the timestamp to work so we’ll add it back in. Interval = frequency of data collection, in this case the data was collected every 15 minutes.

cc<-subset(cc, select=-c(date))
cc$date<-as.POSIXct(cc$datetime, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med<- timeAverage(cc,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 9 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.1hr.med<-subset(cc.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

#change column names
names(cc.1hr.med)[1] <- "datetime"
names(cc.1hr.med)[2] <- "date"
names(cc.1hr.med)[3]<-"salinity"

cc_sal<-cc.1hr.med %>% ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

cc_sal
## Warning: Removed 2898 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

cc.1hr.med$datetime<-as.Date(cc.1hr.med$datetime, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med %>% ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 2898 rows containing missing values (geom_point).

Now I can see the month and year easier. Removing points that don’t look right. Feb 2019 <1, Dec 2019 <15

#adding month-year column just to make the next step of coding easier
cc.1hr.med$month_yr<-format(as.Date(cc.1hr.med$date), "%Y-%m")

#didn't make it a date but I can still use it to filter


cc.1hr.med<- cc.1hr.med[!(cc.1hr.med$salinity< 1 & cc.1hr.med$month_yr=="2019-02") &
                    !(cc.1hr.med$salinity<15 & cc.1hr.med$month_yr=="2019-12"),]


cc.1hr.med %>%ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 2898 rows containing missing values (geom_point).

Looks good. Could remove more points but I’ll confirm first

Save

write.csv(cc.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/cc_sal.csv")

####EOS pier ####

Set up

eos<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/eos.csv",
    header = TRUE
  )


eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))

eos<-subset(eos, select=c(date, time, sea_water_practical_salinity, sea_water_practical_salinity_qc_agg))

eos%>%
  ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with better view since extreme values are making it difficult to see what’s happening

eos%>%
  ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
  geom_point(alpha=0.5)+ ylim(0,35)+
  labs(title="Salinity data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 2 rows containing missing values (geom_point).

Remove data that failed QC test, 4=failed

eos<-eos[- grep("4", eos$sea_water_practical_salinity_qc_agg),]

eos%>%
  ggplot(aes(x=date, y=sea_water_practical_salinity, color=sea_water_practical_salinity))+
  geom_point(alpha=0.5)+ ylim(0,35)+
  labs(title="Salinity data from EOS resesarch pier: data flagged fail removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Roll apply using custom stat function. Data collected every 6 minutes. So 4hrs = 40

eos<-eos%>%
  tq_mutate(
    select= sea_water_practical_salinity,
    mutate_fun= rollapply,
    width= 40,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter and graph

eos<-filter(eos, sea_water_practical_salinity <hi.95 & sea_water_practical_salinity >lo.95)


ggplot(data = eos, mapping = aes(x = date, y = sea_water_practical_salinity, color=sea_water_practical_salinity)) + geom_point()+
  labs(title="Salinity data from EOS resesarch pier: Extreme + +/- 2sd data removed ",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Hourly median

#holder name
names(eos)[1] <- "datex"

#column named "date" with time stamp for hourly median to work
names(eos)[2] <- "date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos.1hr.med<- timeAverage(eos,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.1hr.med<-subset(eos.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

#change column names 
names(eos.1hr.med)[1] <- "datetime"
names(eos.1hr.med)[2] <- "date"
names(eos.1hr.med)[3]<-"salinity"

eos_sal<-eos.1hr.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
eos_sal
## Warning: Removed 2490 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

eos.1hr.med$datetime<-as.Date(eos.1hr.med$datetime, format = c("%m/%d/%Y %H:%M"))

eos.1hr.med %>% ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 2490 rows containing missing values (geom_point).

Remove points that don’t look right Dec 2018 <21

#adding month-year column just to make the next step of coding easier
eos.1hr.med$month_yr<-format(as.Date(eos.1hr.med$date), "%Y-%m")

#didn't make it a date but I can still use it to filter


eos.1hr.med<- eos.1hr.med[!(eos.1hr.med$salinity< 22 & eos.1hr.med$month_yr=="2018-12") ,]


eos.1hr.med %>%ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 2490 rows containing missing values (geom_point).

Format and save

write.csv(eos.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_sal.csv")

####Richardson Bay ####

Set up

rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))

rb<-subset(rb, select=c(date, datetime, Sal, F_Sal, DateTimeStamp))

rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove flagged data that did not pass QAQC. F_sal with a -3 indicates fail

rb<-rb[- grep("-3", rb$F_Sal),]


rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Roll apply using custom stat function. Data collected every 15 min so 4hours=16 width

rb<-rb%>%
  tq_mutate(
    select= Sal,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

rb<-filter(rb, Sal <hi.95 & Sal >lo.95)

rb%>%
  ggplot(aes(x=datetime, y=Sal, color=Sal))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Richardson Bay: Failed QC data + -/+2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 9 rows containing missing values (geom_point).

Hourly median

#holder name
names(rb)[1] <- "datex"

#column named "date" with time stamp for hourly median to work
names(rb)[2] <- "date"
rb$date<-as.POSIXct(rb$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

rb.1hr.med<- timeAverage(rb,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 9 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
#change column names 
names(rb.1hr.med)[1] <- "date"
names(rb.1hr.med)[2] <- "datetime"
names(rb.1hr.med)[3]<-"salinity"

rb_sal<-rb.1hr.med%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity data from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
rb_sal
## Warning: Removed 3681 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

#needs to be Date for breaks to work
rb.1hr.med %>% ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 3681 rows containing missing values (geom_point).

Remove points that don’t look right Dec 2017 <25

#adding month-year column just to make the next step of coding easier
rb.1hr.med$month_yr<-format(as.Date(rb.1hr.med$date), "%Y-%m")

#didn't make it a date but I can still use it to filter


rb.1hr.med<- rb.1hr.med[!(rb.1hr.med$salinity< 25 & rb.1hr.med$month_yr=="2017-12") ,]


rb.1hr.med %>% ggplot(aes(x=datetime, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 3681 rows containing missing values (geom_point).

Format and save

write.csv(rb.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/rb_sal.csv")

####Fort Point####

Set up. For some reason I was getting an error later when I converted the date and datetime column to date columns so I’m only doing one here unlike the other sites but I’ll convert the other ones later. The method is still consistent with the other sites.

fp<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/fort_point.csv",
    header = TRUE
  )
 
fp$date<-as.Date(fp$date, format = c("%Y-%m-%d"))
names(fp)[names(fp) == "sea_water_practical_salinity"] <- "salinity"
names(fp)[names(fp) == "sea_water_practical_salinity_qc_agg"] <- "F_Sal"

fp<-subset(fp, select=c(datetime, date, salinity, F_Sal))

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: All Data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Remove flagged data, 4 indicates fail.

fp<-filter(fp, F_Sal!=4)

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Roll apply using custom stat function. Data collected every 6 minutes 4hrs = 40 observations.

This is where I was getting an error when I had both the date and datetime column converted so I just converted the date column since that’s what used at this step and will convert the other one when I use it. It didn’t do this before so I’m not sure what’s changed.

fp<-fp%>%
  tq_mutate(
    select= salinity,
    mutate_fun= rollapply,
    width= 40,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
fp<-filter(fp, salinity <hi.95 & salinity >lo.95)

fp%>%
  ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  labs(title="Salinity data from Fort Point: Failed QC + +/-2sd",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Hourly median

#holder name
names(fp)[2] <- "datex"

#column named "date" with time stamp for hourly median to work
names(fp)[1] <- "date"
fp$date<-as.POSIXct(fp$date, format = c("%Y-%m-%dT%H:%M:%SZ"))



fp.1hr.med<- timeAverage(fp,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 10 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
#change column names 
names(fp.1hr.med)[1] <- "datetime"
names(fp.1hr.med)[2] <- "date"

fp_sal<-ggplot(data = fp.1hr.med, mapping = aes(x = date, y = salinity, color=salinity)) +
  geom_point()+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity data from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
fp_sal
## Warning: Removed 5283 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

#needs to be Date for breaks to work
fp.1hr.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 5283 rows containing missing values (geom_point).

Remove points that don’t look right Jun 2017 <10, Apr 2018 <5, Jul 2018 <5

#adding month-year column just to make the next step of coding easier
fp.1hr.med$month_yr<-format(as.Date(fp.1hr.med$date), "%Y-%m")

#didn't make it a date but I can still use it to filter

fp.1hr.med<- fp.1hr.med[!(fp.1hr.med$salinity< 10 & fp.1hr.med$month_yr=="2017-06") &
                    !(fp.1hr.med$salinity<5 & fp.1hr.med$month_yr=="2018-04") &
                      !(fp.1hr.med$salinity<5 & fp.1hr.med$month_yr=="2018-07"),]

fp.1hr.med %>% ggplot(aes(x=date, y=salinity, color=salinity))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  geom_hline(yintercept=10, linetype="dashed", color = "red")+
  geom_hline(yintercept=20, linetype="dashed", color = "red")+
  labs(title="Hourly median salinity from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 5283 rows containing missing values (geom_point).

Format and save

write.csv(fp.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fp_sal.csv")

####pH####
Clear environment except for our custom function

rm(list=(ls()[ls()!="custom_stat_fun_2"]))

####China Camp #### Set up

cc<-
  read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/china_camp.csv",
    header = TRUE
  )

cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))

cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))

cc<-subset(cc, select=c(date, datetime, pH, F_pH))

cc%>%
  ggplot(aes(x=datetime, y=pH, color=pH))+
  geom_point(alpha=0.5)+
  labs(title="pH data from China Camp: All data points",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).

Remove data that failed QC test

cc<-cc[- grep("-3", cc$F_pH),]

cc%>%
  ggplot(aes(x=date, y=pH, color=pH))+
  geom_point(alpha=0.5)+
  labs(title="pH data from China Camp: Failed QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Roll median 4hr window

cc<-cc%>%
  tq_mutate(
    select= pH,
    mutate_fun= rollapply,
    width= 8,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter to remove calues that are +/- 2 standard deviations from the rolling median

cc<-filter(cc, pH <hi.95 & pH >lo.95)

cc%>%
  ggplot(aes(x=datetime, y=pH, color=pH))+
  geom_point(alpha=0.5)+
  labs(title="pH data from China Camp: Failed QC points removed + +/- 2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 4 rows containing missing values (geom_point).

Hourly median

#change column names
names(cc)[1] <- "datex"
names(cc)[2] <- "date"
cc$date<-as.POSIXct(cc$date, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med<- timeAverage(cc,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 4 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.1hr.med<-subset(cc.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

#change column names
names(cc.1hr.med)[1] <- "datetime"
names(cc.1hr.med)[2] <- "date"
names(cc.1hr.med)[3]<-"ph"

cc_ph<-cc.1hr.med %>% ggplot(aes(x=date, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median pH from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

cc_ph
## Warning: Removed 11268 rows containing missing values (geom_point).

Removing points that don’t look right: I don’t see any obvious points that need to be removed. Maybe the one at pH 9 but not convinced, need to check

Format and save

write.csv(cc.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/cc_ph.csv")

####EOS pier####
Set up

eos<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/eos.csv",
    header = TRUE
  )

names(eos)[names(eos) == "time"] <- "datetime"
eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos$datetime<-as.POSIXct(eos$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos<-subset(eos, select=c(date, datetime, sea_water_ph_reported_on_total_scale, sea_water_ph_reported_on_total_scale_qc_agg))

names(eos)[names(eos) == "sea_water_ph_reported_on_total_scale"] <- "ph"
names(eos)[names(eos) == "sea_water_ph_reported_on_total_scale_qc_agg"] <- "ph_flag"


eos%>%
  ggplot(aes(x=datetime, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  labs(title="pH data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Remove data that failed QC test, 4=failed

eos<-eos[- grep("4", eos$ph_flag),]

eos%>%
  ggplot(aes(x=date, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  labs(title="pH data from EOS resesarch pier: data flagged fail removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Rolling median 4 hours

eos<-eos%>%
  tq_mutate(
    select= ph,
    mutate_fun= rollapply,
    width= 40,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter and graph

eos<-filter(eos, ph <hi.95 & ph >lo.95)


ggplot(data = eos, mapping = aes(x = datetime, y = ph, color=ph)) + geom_point()+
  labs(title="pH data from EOS resesarch pier: +/- 2sd data removed ",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Hourly median

names(eos)[1] <- "datex"
names(eos)[2] <- "date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos.1hr.med<- timeAverage(eos,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.1hr.med<-subset(eos.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(eos.1hr.med)[1] <- "datetime"
names(eos.1hr.med)[2] <- "date"

eos_ph<-eos.1hr.med %>% ggplot(aes(x=date, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median ph from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
eos_ph
## Warning: Removed 1025 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

eos.1hr.med$datetime<-as.Date(eos.1hr.med$datetime, format = c("%m/%d/%Y %H:%M"))

eos.1hr.med %>% ggplot(aes(x=datetime, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median pH from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 1025 rows containing missing values (geom_point).

Remove points that don’t look right Jun 2019 <7.25, Jul 2019 <7.26, Aug 2019 <7.25

#adding month-year column just to make the next step of coding easier
eos.1hr.med$month_yr<-format(as.Date(eos.1hr.med$date), "%Y-%m")

eos.1hr.med<- eos.1hr.med[!(eos.1hr.med$ph< 7.25 & eos.1hr.med$month_yr=="2019-06")&
                          !(eos.1hr.med$ph< 7.26 & eos.1hr.med$month_yr=="2019-07")&
                          !(eos.1hr.med$ph< 7.25 & eos.1hr.med$month_yr=="2019-08"),]


eos.1hr.med %>%ggplot(aes(x=datetime, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median pH from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 1025 rows containing missing values (geom_point).

Save

write.csv(eos.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_ph.csv")

####Richardson Bay####

Set up

rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))

rb<-subset(rb, select=c(date, datetime, pH, F_pH, DateTimeStamp))

rb%>%
  ggplot(aes(x=datetime, y=pH, color=pH))+
  geom_point(alpha=0.5)+
  labs(title="pH data from Richardson Bay: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove flagged data that did not pass QAQC.

rb<-rb[- grep("-3", rb$F_pH),]


rb%>%
  ggplot(aes(x=datetime, y=pH, color=pH))+
  geom_point(alpha=0.5)+
  labs(title="pH data from Richardson Bay: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Rolling median 4 hours

rb<-rb%>%
  tq_mutate(
    select= pH,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

rb<-filter(rb, pH <hi.95 & pH >lo.95)

rb%>%
  ggplot(aes(x=datetime, y=pH, color=pH))+
  geom_point(alpha=0.5)+
  labs(title="pH data from Richardson Bay: Failed QC data + -/+2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 8 rows containing missing values (geom_point).

Hourly median

names(rb)[1] <- "datex"
names(rb)[2] <- "date"

rb$date<-as.POSIXct(rb$date, format = c("%m/%d/%y %H:%M"))


rb.1hr.med<- timeAverage(rb,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 8 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb.1hr.med<-subset(rb.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(rb.1hr.med)[1] <- "datetime"
names(rb.1hr.med)[2] <- "date"
names(rb.1hr.med)[3] <- "ph"


rb_ph<-rb.1hr.med%>%
  ggplot(aes(x=date, y=ph, color=ph))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median pH data from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
rb_ph
## Warning: Removed 9013 rows containing missing values (geom_point).

Don’t see any points that need to be removed

Format and save

write.csv(rb.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/rb_ph.csv")

####Fort Point####
No pH data at this station.

####Dissolved oxygen#### Clear environment except for our custom function

rm(list=(ls()[ls()!="custom_stat_fun_2"]))

####China Camp #### Set up

cc<-
  read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/china_camp.csv",
    header = TRUE
  )

cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))
cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))

cc<-subset(cc, select=c(date, datetime, DO_mgl, F_DO_mgl))

cc%>%
  ggplot(aes(x=datetime, y=DO_mgl, color=DO_mgl))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from China Camp: All data points",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).

Remove data that failed QC test

cc<-cc[- grep("-3", cc$F_DO_mgl),]

cc%>%
  ggplot(aes(x=date, y=DO_mgl, color=DO_mgl))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from China Camp: Failed QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Rolling median 4 hr

cc<-cc%>%
  tq_mutate(
    select= DO_mgl,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter to remove calues that are +/- 2 standard deviations from the rolling median

cc<-filter(cc, DO_mgl <hi.95 & DO_mgl >lo.95)

cc%>%
  ggplot(aes(x=datetime, y=DO_mgl, color=DO_mgl))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from China Camp: Failed QC points removed + +/- 2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 10 rows containing missing values (geom_point).

Hourly median

names(cc)[1]<-"datex"
names(cc)[2]<-"date"

cc$date<-as.POSIXct(cc$date, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med<- timeAverage(cc,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 10 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.1hr.med<-subset(cc.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(cc.1hr.med)[1]<-"datetime"
names(cc.1hr.med)[2]<-"date"
names(cc.1hr.med)[3]<-"do"

cc_do<-cc.1hr.med %>% ggplot(aes(x=date, y=do, color=do))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median dissolved oxygen from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

cc_do
## Warning: Removed 1461 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

cc.1hr.med$datetime<-as.Date(cc.1hr.med$datetime, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med %>% ggplot(aes(x=datetime, y=do, color=do))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median dissolved oxygen from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1461 rows containing missing values (geom_point).

Remove any points <4
not sure about: the points that go past 12 and the chunk below 6 in Sept-Oct 2018

#adding month-year column just to make the next step of coding easier
cc.1hr.med$month_yr<-format(as.Date(cc.1hr.med$date), "%Y-%m")

cc.1hr.med<- cc.1hr.med[!(cc.1hr.med$do< 4),]


cc.1hr.med %>%ggplot(aes(x=datetime, y=do, color=do))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median dissolved oxygen from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1461 rows containing missing values (geom_point).

Save

write.csv(cc.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/cc_do.csv")

####EOS pier####

Set up

eos<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/eos.csv",
    header = TRUE
  )


names(eos)[names(eos) == "time"] <- "datetime"

eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos$datetime<-as.POSIXct(eos$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos<-subset(eos, select=c(date, datetime, mass_concentration_of_oxygen_in_sea_water, mass_concentration_of_oxygen_in_sea_water_qc_agg))

names(eos)[names(eos) == "mass_concentration_of_oxygen_in_sea_water"] <- "do"
names(eos)[names(eos) == "mass_concentration_of_oxygen_in_sea_water_qc_agg"] <- "do_f"


eos%>%
  ggplot(aes(x=datetime, y=do, color=do))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with better view

eos%>%
  ggplot(aes(x=datetime, y=do, color=do))+
  geom_point(alpha=0.5)+
  ylim(0,15)+
  labs(title="Dissolved oxygen data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 1 rows containing missing values (geom_point).

Remove data that failed QC test, 4=failed

eos<-eos[- grep("4", eos$do_f),]

eos%>%
  ggplot(aes(x=date, y=do, color=do))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from EOS resesarch pier: data flagged fail removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Rolling median 4hour window

eos<-eos%>%
  tq_mutate(
    select= do,
    mutate_fun= rollapply,
    width= 40,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter and graph

eos<-filter(eos, do <hi.95 & do >lo.95)


ggplot(data = eos,
       mapping = aes(x = date, y = do, color = do)) + geom_point() +
  labs(title = "Dissolved oxygen data from EOS resesarch pier: Extreme + +/- 2sd data removed ",
       subtitle = "01/01/2017 - 12/31/2019",
       caption = "data courtesy of CeNCOOS")

Hourly median

names(eos)[1]<-"datex"
names(eos)[2]<-"date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos.1hr.med<- timeAverage(eos,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.1hr.med<-subset(eos.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(eos.1hr.med)[1]<-"datetime"
names(eos.1hr.med)[2]<-"date"

eos_do<-eos.1hr.med %>% ggplot(aes(x=date, y=do, color=do))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median dissolved oxygen from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
eos_do
## Warning: Removed 1203 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

eos.1hr.med %>% ggplot(aes(x=date, y=do, color=do))+
  geom_point(aldoa=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median dissolved oxygen from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Ignoring unknown parameters: aldoa
## Warning: Removed 1203 rows containing missing values (geom_point).

Remove points Dec 2018 <5

#adding month-year column just to make the next step of coding easier
eos.1hr.med$month_yr<-format(as.Date(eos.1hr.med$date), "%Y-%m")

eos.1hr.med<- eos.1hr.med[!(eos.1hr.med$do< 5 & eos.1hr.med$month_yr=="2018-12"),]


eos.1hr.med %>%ggplot(aes(x=date, y=do, color=do))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median dissolved oxygen from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 1203 rows containing missing values (geom_point).

Save

write.csv(eos.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_do.csv")

####Richardson Bay ####

Set up

rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))

rb<-subset(rb, select=c(date, datetime, DO_mgl, F_DO_mgl, DateTimeStamp))

rb%>%
  ggplot(aes(x=datetime, y=DO_mgl, color=DO_mgl))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from Richardson Bay: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove flagged data that did not pass QAQC

rb<-rb[- grep("-3", rb$F_DO_mgl),]


rb%>%
  ggplot(aes(x=datetime, y=DO_mgl, color=DO_mgl))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from Richardson Bay: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Rolling median

rb<-rb%>%
  tq_mutate(
    select= DO_mgl,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

rb<-filter(rb, DO_mgl <hi.95 & DO_mgl >lo.95)

rb%>%
  ggplot(aes(x=datetime, y=DO_mgl, color=DO_mgl))+
  geom_point(alpha=0.5)+
  labs(title="Dissolved oxygen data from Richardson Bay: Failed & suspect QC data + -/+2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 10 rows containing missing values (geom_point).

Hourly median

names(rb)[1]<-"datex"
names(rb)[2]<-"date"

rb$date<-as.POSIXct(rb$date, format = c("%m/%d/%y %H:%M"))


rb.1hr.med<- timeAverage(rb,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 10 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb.1hr.med<-subset(rb.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(rb.1hr.med)[1] <- "datetime"
names(rb.1hr.med)[2] <- "date"
names(rb.1hr.med)[3]<-"do"



rb_do<-rb.1hr.med%>%
  ggplot(aes(x=date, y=do, color=do))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median dissolved oxygen data from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
rb_do
## Warning: Removed 2103 rows containing missing values (geom_point).

Don’t see obvious points that need to be removed

Save

write.csv(rb.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/rb_do.csv")

####Fort Point #### No dissolved oxygen data at this station.

####Water temperature#### Clear environment except for our custom function

rm(list=(ls()[ls()!="custom_stat_fun_2"]))

####China Camp #### Set-up

cc<-
  read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/china_camp.csv",
    header = TRUE
  )

cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))

cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))

cc<-subset(cc, select=c(date, datetime,Temp, F_Temp))

cc%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: All data points",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).

Remove data that failed QC test

cc<-cc[- grep("-3", cc$F_Temp),]

cc%>%
  ggplot(aes(x=date, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: Failed QC points removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).

Rolling median

cc<-cc%>%
  tq_mutate(
    select= Temp,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

cc<-filter(cc, Temp <hi.95 & Temp >lo.95)

cc%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from China Camp: Failed QC points removed + +/- 2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 7 rows containing missing values (geom_point).

Hourly median

names(cc)[1]<-"datex"
names(cc)[2]<-"date"
cc$date<-as.POSIXct(cc$date, format = c("%m/%d/%Y %H:%M"))

cc.1hr.med<- timeAverage(cc,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 7 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.1hr.med<-subset(cc.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(cc.1hr.med)[1]<-"datetime"
names(cc.1hr.med)[2]<-"date"
names(cc.1hr.med)[3]<-"water_temp"

cc_temp<-cc.1hr.med %>% ggplot(aes(x=date, y=water_temp, color=water_temp))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median water temperature from China Camp",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")

cc_temp
## Warning: Removed 2798 rows containing missing values (geom_point).

Not sure what or if points need to be removed

Format and save

write.csv(cc.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/cc_watertemp.csv")

####EOS pier ####

Set up

eos<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/eos.csv",
    header = TRUE
  )


eos$date<-as.Date(eos$date, format = c("%Y-%m-%d"))
eos$datetime<-as.POSIXct(eos$time, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos<-subset(eos, select=c(date, datetime, sea_water_temperature, sea_water_temperature_qc_agg))

eos%>%
  ggplot(aes(x=datetime, y=sea_water_temperature, color=sea_water_temperature))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from EOS resesarch pier: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Remove data that failed QC test, 4=failed

eos<-eos[- grep("4", eos$sea_water_temperature_qc_agg),]

eos%>%
  ggplot(aes(x=date, y=sea_water_temperature, color=sea_water_temperature))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from EOS resesarch pier: data flagged fail removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Rolling median

eos<-eos%>%
  tq_mutate(
    select= sea_water_temperature,
    mutate_fun= rollapply,
    width= 40,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter and graph

eos<-filter(eos, sea_water_temperature <hi.95 & sea_water_temperature >lo.95)


ggplot(data = eos,
       mapping = aes(x = date, y = sea_water_temperature, color = sea_water_temperature)) + geom_point() +
  labs(title = "Water temperature data from EOS resesarch pier: Extreme values & +/- 2sd data removed ",
       subtitle = "01/01/2017 - 12/31/2019",
       caption = "data courtesy of CeNCOOS")

Hourly median

names(eos)[1]<-"datex"
names(eos)[2]<-"date"

eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

eos.1hr.med<- timeAverage(eos,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.1hr.med<-subset(eos.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(eos.1hr.med)[1]<-"datetime"
names(eos.1hr.med)[2]<-"date"
names(eos.1hr.med)[3]<-"water_temp"

eos_temp<-eos.1hr.med %>% ggplot(aes(x=date, y=water_temp, color=water_temp))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median ph from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
eos_temp
## Warning: Removed 2203 rows containing missing values (geom_point).

Remove points that still don’t look right. First graph it so it’s easier to the months

eos.1hr.med %>% ggplot(aes(x=date, y=water_temp, color=water_temp))+
  geom_point(aldoa=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median water temp from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Ignoring unknown parameters: aldoa
## Warning: Removed 2203 rows containing missing values (geom_point).

Remove points Dec 2018 >25

#adding month-year column just to make the next step of coding easier
eos.1hr.med$month_yr<-format(as.Date(eos.1hr.med$date), "%Y-%m")

eos.1hr.med<- eos.1hr.med[!(eos.1hr.med$water_temp> 25 & eos.1hr.med$month_yr=="2018-12"),]


eos.1hr.med %>%ggplot(aes(x=date, y=water_temp, color=water_temp))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median water temp from EOS resesarch pier",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 2203 rows containing missing values (geom_point).

Save

write.csv(eos.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_watertemp.csv")

####Richardson Bay####

Set up

rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)

rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))

rb<-subset(rb, select=c(date, datetime, Temp, F_Temp))

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Richardson Bay: All data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Remove flagged data that did not pass QAQC

rb<-rb[!grepl("-3",rb$F_Temp),]

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Richardson Bay: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).

Rolling median

rb<-rb%>%
  tq_mutate(
    select= Temp,
    mutate_fun= rollapply,
    width= 16,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique

Filter

rb<-filter(rb, Temp <hi.95 & Temp >lo.95)

rb%>%
  ggplot(aes(x=datetime, y=Temp, color=Temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Richardson Bay: Failed & suspect QC data + -/+2sd removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
## Warning: Removed 12 rows containing missing values (geom_point).

Hourly median

names(rb)[1]<-"datex"
names(rb)[2]<-"date"
rb$date<-as.POSIXct(rb$date, format = c("%m/%d/%y %H:%M"))


rb.1hr.med<- timeAverage(rb,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "15 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 12 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb.1hr.med<-subset(rb.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(rb.1hr.med)[1]<-"datetime"
names(rb.1hr.med)[2]<-"date"
names(rb.1hr.med)[3]<-"water_temp"



rb_temp<-rb.1hr.med%>%
  ggplot(aes(x=date, y=water_temp, color=water_temp))+
  geom_point(alpha=0.5)+
  labs(title="Hourly median water temperature data from Richardson Bay",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of NERR")
rb_temp
## Warning: Removed 3494 rows containing missing values (geom_point).

Don’t see points that need to be removed

Format and save

write.csv(rb.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/rb_watertemp.csv")

####Fort Point ####

Set up

fp<-read.csv(
      "https://raw.githubusercontent.com/Cmwegener/thesis/master/data/environmental/raw/fort_point.csv",
    header = TRUE
  )
 
fp$date<-as.Date(fp$date, format = c("%Y-%m-%d"))
names(fp)[names(fp) == "sea_water_temperature"] <- "temp"
names(fp)[names(fp) == "sea_water_temperature_qc_agg"] <- "F_temp"

fp<-subset(fp, select=c(datetime, date, temp, F_temp))

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: All Data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Plot with a better view

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: All Data",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Remove flagged data, 4 indicates fail.

fp<-filter(fp, F_temp!=4)

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: Failed QC data removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Rolling median

fp<-fp%>%
  tq_mutate(
    select= temp,
    mutate_fun= rollapply,
    width= 40,
    align= "right",
    by.column= FALSE,
    FUN= custom_stat_fun_2,
    na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
fp<-filter(fp, temp <hi.95 & temp >lo.95)

fp%>%
  ggplot(aes(x=date, y=temp, color=temp))+
  geom_point(alpha=0.5)+
  labs(title="Water temperature data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")

Hourly median

names(fp)[1]<-"date"
names(fp)[2]<-"datex"
fp$date<-as.POSIXct(fp$date, format = c("%Y-%m-%dT%H:%M:%SZ"))

fp.1hr.med<- timeAverage(fp,
            avg.time="1 hour",
            data.thresh = 0,
            statistic = "median",
            interval = "6 min",
            remove.calm = FALSE,
            type = "default")
## Warning: Missing dates detected, removing 10 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp.1hr.med<-subset(fp.1hr.med, select=(-c(median, stdev, hi.95, lo.95)))

names(fp.1hr.med)[1]<-"datetime"
names(fp.1hr.med)[2]<-"date"
names(fp.1hr.med)[3]<-"water_temp"



fp_temp<-ggplot(data = fp.1hr.med, mapping = aes(x = date, y = water_temp, color=water_temp)) + 
  geom_point()+
    labs(title="Hourly median water temperature data from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
fp_temp
## Warning: Removed 5037 rows containing missing values (geom_point).

Lools like points <9 need to be removed

Remove points

#adding month-year column just to make the next step of coding easier
fp.1hr.med$month_yr<-format(as.Date(fp.1hr.med$date), "%Y-%m")

fp.1hr.med<- fp.1hr.med[!(fp.1hr.med$water_temp< 9),]


fp.1hr.med %>%ggplot(aes(x=date, y=water_temp, color=water_temp))+
  geom_point(alpha=0.5)+
  scale_x_date(date_breaks = "6 month", date_labels = "%b %Y", date_minor_breaks = "1 month")+
  labs(title="Hourly median water temp from Fort Point",
       subtitle="01/01/2017 - 12/31/2019",
       caption= "data courtesy of CeNCOOS")
## Warning: Removed 5037 rows containing missing values (geom_point).

Save

write.csv(fp.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fp_watertemp.csv")